i 



Europaisches 
Patentamt 



European 
Patent Office 



tao^/ 51491 



Office europeen 
des brevets 



REC'D 2& AUG 200^ 



miSL 



Bescheinigung Certificate 



Attestation 



Die angehefteten Unterla- 
gen stimmen mit der 
ursprOnglich eingerelchten 
Fassung der auf dem n3ch- 
sten Blatt bezeichneten 
europSischen Patentanmel- 
dung Oberein. 



The attached documents 
are exact copies of the 
European patent application 
described on the following 
page, as originally filed. 



Les documents fix6s k 
cette attestation sont 
con formes d la version 
Inltialement d6pos6e de 
la demande de brevet 
europeen sp6cifi6e d la 
page suivante. 



Patentanmeldung Nr. Patent application No. Demande de brevet n** 

03103293.1 



m 
m 



' PRIORITY 
: DOCUMENT 

I SUBMITTED OR TRANSMITTED IN 
; COMPLIANCE WITH RULE 1 7. 1(a) OR (b) 



w 
r- 

o 
O 



Der President des Europdischen Patentamts; 
Im Auftrag 

For the President of the European Patent Office 
Le President de r Office europ6en des brevets 

P.O. 



R C van Dijic 



EuropSlsches 
Patentatnt 



European 
Patent Office 



Office europeen 
des brevets 



Anmeldung Nr: 

Application no. : 03103293. 1 
Demande no: 



Anmeldetag: 

Date of filing: 04.09.03 
Date de d£p6t: 



Anmelder/Appl IcantC s)/DeiRandeur( s): 

Koninklljke Philips Electronics N.V. 
Groenewoudseweg 1 
5621 BA Eindhoven 
PAYS-BAS 



Bezelchnung der Erf 1ndung/T1 tie of the Inventlon/TI tre de 1» Invention: 
(Falls die Bezelchnung der Erflndung nicht angegeben 1st, slehe Beschrelbung. 
If no title Is shown please refer to the description. 
SI aucun titre n*est Indlqu^ se referer a la description.) 

Data processing system 

In Anspruch genommene Prloriat(en) / Prlorlty(les) claimed /Pr1or1t6(s) 
revendl qu€e( s) 

Staat/Tag/Aktenzelchen/State/Date/Flle no./Pays/Date/Num^ro de dfipdt: 



Internationale Patentklasslflkatl on/International Patent Classification/ 
Classification Internationale des brevets: 

G06F12/08 



Am Anmeldetag benannte Vertragstaaten/Contractlng states designated at date of 
flllng/Etats contractants d4s1gn6es Tors du d^pot: 



AT BE BG CH CY GZ DE DK EE ES FI FR GB GR HU IE IT LU MC NL 
PT RO SE SI SK TR LI 



03103293. 1 

EPA/EPO/OEB Form 1014.2 - 01,2000 7001014 



2 



PHNL031032EPP 

1 27.08.2003 

Data processing system 



FIEIJ> OF TEffi IhTS^BNTION 

The present invention lelates to a data processing system comprising a 
memory means and a plurality of data processing means provided for accessing to said 
memory means. 

5 

BACKGROUND OF THE INVENTION 

Such a sjrstem is usually called a multi processor system wherein the data 
processing means operate to some extent independently from each other and cany out certain 
processes. The plurality of data processing means have access to the memory means so that 

10 the memory means is shared by the plurality of data processing means. Usually, only a single 
common memory means is provided in such a data jirocessing system, and the plurality of 
data processing means are provided on a single common chip, whereas Ifae memory means 
resides outside such chip as an off-chip memory. As the internal details of said processing 
means are outside the scope of Ifais invention, they will simply he roferred to as intellectual 

IS property (IF) means. 

As an example of such a shared-memory data processing ssrstem, a digital 
video platform (DVP) system in its basic form is shown in Figure 1. It comprises a plurality 
of data processing units IP which communicate via an off-chip memory SDRAM. The data 
processing units IP can be programmable devices like CPUs, application-specific hardware 

20 blocks, subsystems with a complex internal structure etc. Further provided in the system of 
Figure 1 is a device transaction level (DTL) intetfece DTL via which each data processing 
unit IP is inter&ced to a cmitral main memory (MMT) interface MMI which arbitrates the 
accesses to the off-chip memory SDRAM. All IP-to-IP communication is done via logical 
buffers (not shown) mapped in the off-chip memory SDRAM. Usually, one of the data 

25 processing units IP is a CPU (central processing unit) which manages the configuration of a 
task graph by programming the data processing units via a network of memory mapped 
configuration registers (not shown). A synchronization among the data processing units IP is 
also handled in a centralized way by this CPU which notifies the data processing units IP via 
the memory noapped input/output network whether fiill or empty bu£fers are available. The 
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data processing units IP notify the CPU via interrupt lines (not shown) whelher these buffers 
have run empty or have been filled 

The mechanism used for synchronization results in tliat the buffers provided in 
the oflF-chip memory SDRAM must be rather large in order to keep the rate of the interrupts 
5 to the CPU low. For example, video processing units often synchronize at a coarse grain (e.g. 
a frame) even though from a fimctional perspective they could synchronize at a finer grain 
(e.g. a line). 

Since such a data processing system comprises a shared memory architecture, 
there is a single address space which is accessible to all data processing means. This 

1 0 simplifies the jirogramming modeL Further, the common memory means helps to provide 
cost-effective system solutions. 

However, such a data processing system in its basic form has a number of 
disadvantages which will become more eminent as technology progresses. Namely, as the 
number of data processing means increases, the number of connections to the memory 

1 5 inter&ce increases resulting in a more cornplex memory interface. In particular, the 

arbitration among the different data processing means becomes more complex. Further, wire 
length may become a problem for the data processing means wbich are located far fix>m the 
memory interfiace so tibiat many long wires may cause wiring congestion as well as time delay 
and powar consumption problems. A furdaer critical disadvantage is that there is a potential 

20 bottleneck when bandwidth requirements increase further; the bandwidth to tbe (off-chip) 
memory means is restricted by certain aspects like signalling speed and pin count of the off- 
chip interconnect 

GB 2 233 480 A discloses a multi-processor data processing system wherein 
each processor has a local memory. The local memories together form the main memory of 

25 the system, and any processor can access any memory, whether it is local to that processor or 
remote. Each processor has an inter&ce circuit which determines whether a memory access 
request relates to the local memory or to a remote memory, and routes the request to die 
appropriate memory, wherein remote requests are routed over a bus. Whenever a write access 
is made to the local memory, a dummy write request is routed over the bus to all the ottxeac 

30 processors. Each processor monitors all write requests on the bus, and if a copy of the 

location specified in the request is held in a local cache memory, sudi copy is invalidated so 
as to ensure cache consistency. 

US 5,261,067 A discloses an apparatus and a method for ensuring data cache 
content integrity among parallel processors. Each processor has a data cache to store results 
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of intennediate calculations. The data cache of each processor is synchronized with each 
other through the use of synchronization intervals. During entry of a synchronization interval, 
modified data variables contained in an individual cache are written back to a shared 
memory. The unmodified data contained in a data cache is flushed fix)m the msmoTy. During 
5 exiting of a synchronization interval, data variables which were not modified since entcy into 
the synchronization interval are also flushed. By retaining modified data cache values in the 
individual processors which confuted the modified values, unnecessary access to shared 
memory is avoided. 

US 6,253,290 Bl describes a multiprocessor system having a plurality of 

10 processor units each including a CPU and a local cache memory connected to tiie CPU. The 
CPUs have their shared bus terminals connected to a global shared bus, and local cache 
memories have their bus terminals connected to a global unshared bus. The global shared bus 
is connected to an external shared m^ory for storing shared information used in common by 
the CPUs, and the global unshared bus is connected to an eternal unshared memory for 

1 S storing unshared information used by the CPUs. 

US 6,282,708 Bl discloses a melfaod for structuring a multi-instmction 
computer program as containing a plurality of basic blocks which each coiopose firom 
internal instructions and external instructions organized in an internal directed acyclic graph. 
A guarding is executed on successor instructions which each collectively emanate fiom a 

20 respectively associated single predecessor instmction. A subset of joined instructions which 
converge onto a single join/target instruction are then unconditionally joined. This is 
accomplished by letting each respective instmction in the subset of joined instructions be 
executed under mutually non-related conditions, specifying all op^tions with respect to a 
jump instruction, specifying all operations which must have been executed previously and 

25 linking various basic blocks comprising subsets of successor instructions in a directed acyclic 
graph which allows parallel execution of any further subset of instructions contained therein. 

SUMMARY OF THE INVENTION 

An object of the present invention is to overcome the above mentioned 
30 drawbacks and to improve the data processing system so as to avoid a communication 

bottleneck between the data processing and the metnory means even when the bandwidth 
requirements increase further, to decrease the number of connections to the memory 
interface, and to reduce the wire length. 



PHNL031032EPP 



4 27.08^003 
In order to achieve the above and further objects, according to the present 
invention there is provided a data processing system comprising a memory means and a 
plurality of data processing means provided for accessing to said memory mrans, 
characterized by a commimication interface means coupled between said memory means and 
5 said plurality of data processing means, said communication interface means including a 
network of nodes, each node comprising at least one slave port for receiving a memory 
access request from a data processing means or fix)m a previous node and at least one master 
port for issueing a memory access requ^t to a next node or to said memory means in 
accordance with the memory access reqaest received at said slave port; wherein said at least 

10 <me slave port is coimected to a master port of a previous node or to one of said data 

processing means and said at least one master port is connected to a slave port of a next node 
or to said memory means. 

Due to the construction according to the invention, the number of cormections 
to the memory means is reduced. This is achieved by the provision of a ^edfic physical 

1 5 organisation of a logically shared memory architecture herein tibe communication inter&ce 
means includes a network of a plurality of nodes having slave ports receiving memory access 
requests from the data processing means and at least cme master port issuing a specific 
memory access request to the memory means. Typically, the number of master ports of a 
node is smaller than the number of slave ports of that node. As a result, the complexity of the 

20 memory inter&ce is decreased since the number of clients connected thereto is reduced. 
Further, due to the interconnection of the communication inter&ce means, the length of 
individual wires and, thus, the total length of wires are reduced so as to help avoiding wire 
congestion. 

The communication inter&ce means includes a network of a plurality of node 
25 means, wherein each node means cornprises at least one slave port for receiving a memory 
access request and at least one master port for issuing a memory access request in accordance 
with the memory access request received at said slave port(s), wherein the number of said 
slave ports can be higher than the number of said master ports. So, the conomunication 
int^iace means according to the present invention includes a node structure for the 
30 cormections between the data processing means and the memory means wherein multiple 
data processing means can be cormected to a node means via its slave ports, whereas each 
node means has only one or a few master ports. Since the slave ports of a node means are 
uniform in that they ojBfer the same services, it is transparent to the node means whether a 
slave port attaches a data processing means or another node means. A request for a memory 
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access issued by a data processing means is passed to one of the node means to which it is 
connected. An advantage of ttie concept of the invention is that it can be introduced in a step- 
wise manner. Namely, a first chip to adopt the concept of the invention could use a new node 
means for just a few data processing means; and in later chips, the number of node means 
may gradually rise, and the fecilities of the communication interfece means of the invention 
may be used for more and more communication between data processmg means. In case a 
node means has multiple master ports, a single port is selected for forwarding, for example, 
in accordance with an address range discrimination. 

In a first embodunent of the invention, each of the slave ports of said node 
means is connected to one of said plurality of data processing means and the master ports of 
said node means are coiqiled to said memory means. So, all the node means are provided in 
the same level, and the data processing means are coiqiled to the memory means via such 
single level of node means. The data processing meam can issue memory access requests to 
an associated node means which forwards the request to the memory means. 

In an alternative second embodiment, the network of node means is 
hierarchically structured having the data processing means as leaves and the node means as 
nodes. 

The plurality of node means can be arranged in a directed acyclic graph 
structure. Each edge of the directed acyclic graph structure corresponds to an interconnect 
path which serves memory access requests. The edges are directed, and each edge comiects a 
master port of one node means to a slave port of another node means. In case the memory 
means includes a plurality of memory sections, the acyclic graph structure can result in a set 
of master ports each providing a connection to a different memory section, in such a way tiaat 
each data processing means can communicate to one or more memory section(s), thereby 
allowing multiple data processing means to access different memory sections concurrentiy, 
thereby reducing the bandwidth botdenedc Furthermore, the acyclic graph structure might 
provide several different paths through the graph leading from one data processing means to 
one memory section. These different paths can be employed advantageously to fimher reduce 
communication bottienecks or to avoid the use of faulty connections. 

Further, in a preferred refinement of the above embodiment, the plurality of 
node means may be arranged in a tree structure so that one or more data processing mean(s) 
and/or previous node(s) are connected to a node means via its slave ports, but each node 
means has only one master port This simplifies the forwarding process in the node means 
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and provides connectivity to a single memory section at the root of the node tree that can be 
accessed by all data processing means. 

In this alternative second modification, preferably the plurality of node means 
include n groups of node means with n > 2, wherein each of the slave ports of the node 
means of a first group is connected to one of said plurality of data processing means, the 
master ports of the node means of the n*^ group are coiqiled to said memory means, and each 
of the slave ports of the node means of the n*^ group is cotmected to the master port of the 
node means of the (nr 1)*^ group. So, the plurality of node means are divided into n groups, 
wherein each group defines a different level in the structure. If a node means receives a 
memory access request at one of its slave ports, the request is forwarded to a node means of a 
higher groi^) or, if the node means is in the hi^est (i.e. n*^ group, to the memory means. 
With the respect thereto, it should be acUed that the node stracture does not necessarily need 
to have a uniform deptti. Some data processing means may be "close" to the memory means 
in that only one or two nodes separate them fitom the memory means, whereas (at the same 
time) other data processing means may be more 'demote" fix>m the memory means in that the 
memory access requests that they issue have to travel via a large nuinber of nodes. 

The above mentioned hierarchical structure of the node means is completely 
transparent to the data processing means so that no modifications of the data processing 
means are required. 

Preferably, the node means are hubs. 

In a fiulher preferred embodiment of the invention, at least one local memory 
unit is attached to the communication interfece, allowing such local memory unit to be 
selectively accessed to by a memory access request So, a smgle address space is distributed 
over a global memory and such local memory unit(s). The advantage of this embodiment is 
that the data processing means may exchange data to each other via a local memory unit only 
instead of using the global memory, resulting in a further reduction of a potential botdeneck 
risk in the communication, by reducing memory access latency, reducing power 
consumption, and reducing the use of external memory bandwidth. 

Preferably, at least one node means further comprises at least one memory port 
to which a local m^ory unit is connected. So, the node means checks whether or not the 
memory access request refers to the local memory unit(s) attached to such node means by 
comparing the address of the memory access request with the address ranges associated to the 
local memory unit(s) attached to such node means. If yes, the memory access is executed on 
the selected local memory unit Otherwise, the node means forwards the memory access 
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request titirough one of its master ports either to a ne?ct node means where the check and 
''access or forward" is repeated if a local memoiy unit is also attached to such next node 
means, or to the memory means. 

In a modification of the recently mentioned embodiment, the communication 
5 int^ace means includes a cache controller means for controlling the local SGiemoiy umt(s) as 
a cache memory. In Ibis case at least apart of the local memoiy unit is used to locally store a 
copy of data residing in other memories reachable via one of its master ports. So, memory 
access requests can be served locally for a wider address range for which caching behaviour 
is enabled. 

10 Moreover, the communication inter&ce means may include at least one 

syochroniration means for streanung communication between data processing means. Jn 
particular, at least one node means includes said synchronization means for streaming 
communication between Ifae data processing means direcdy or indirectiy coupled to said node 
means. 

IS In case a local memory unit is attached to the node means, the local memory 

unit should have a first-in/first-out (FIFO) function, and tiie synchronization means 
comprises a FIFO administcation means for controlling said local memory umt(s). The 
synchronization services can be locally handled when llie FIFO administration is locally 
stored, whereby multiple data processing means can communicate data via the local memory 

20 unit(s) attached to the node means; otherwise the synchronization request is forwarded to one 
of the master ports of the corresponding node means. 

In a fiirther preferred embodiment, the commumcation inter&ce means is 
provided on a single chip. Moreover, at least a part of the plurality of data processing means 
may be additionally provided on said single chip. 

25 The above described objects and other aspects of the present invention will be 

better understood by the following description and the accompanying figures. 

BRIEF DESCRIPTION OF THE DRAWINGS 

Preferred embodiments of the present invention are described with reference to 
30 the drawings in which 

Fig. 1 shows a schematic basic block diagram of a DVP system in its basic 
form according to the prior art; 

Fig. 2 shows a schematic basic block diagram of a DVP system including a 
hub structure according to a first preferred embodiment of the present invention; 
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Fig. 3 shows a schematic hasic block diagram of a DVP system with a hub 
structure according to a second preferred embodiment of the present invention which further 
includes on-chip local memories; and 

Fig. 4 shows a schematic basic block diagram of a DVP system including a 
5 hub structure according to a third preferred embodiment of the present invention. 

DESCRIPTION OF PREFERRED EMBODIMENTS 

Figure 2 shows a digital video platform (DVP) system having a hub structure 
according to a &st preferred embodiment of the present invention. Like the system shown in 

10 Figure 1, the system of Figure 2 comprises a plurality of data processing units IP, which may 
also be called intellectual property (JP) units, and a memory SDRAM. The data processing 
units IP can be programmable devices (CPUs), ^lication-spedjSc hardware blocks, 
subsystems with a complex internal structure etc. All data jmoessing units conaprise a device 
transaction level (DTL) interface. Furfher, there are provided a plurality of hubs Hi i, H12, H2, 

1 5 wherein each hub comprises several slave ports s and one master port m. In the system of 
Figure 2, the hubs define a network of hubs comprising a first gcovp of hubs Hn and H12 and 
a second groiqp consisting of only one hub H2. The first group of hubs Hn and H12 define a 
first level adjacent to the data processing units IP so that the hubs Hn and H12 of the first 
groi9 are direcdy connected to the data processing units IP via its slave ports s. In the 

20 enibodiment of Figure 2, each hub has only one master port m for connection to the next hub 
or the memory interface MMI. The hubs H] i of the first group are connected via its master 
ports m to the slave ports s of the hub H2 of the second group which i s connected via its 
master port m to the memory inter&ce MMI whereas the hubs H12 of the first gjcoxxp are 
directly coimected via its master ports m to the memory interface MMI. The memory 

25 interface MMI is coupled to the memory SDRAM. 

In the ^bodiment of Figure 2, the network of hubs is organized as a directed 
acyclic gjcaph (DAG) structure wherein the nodes of the DAG structure are defined by the 
hubs Hi 1, H12 and H2, and each edge of the DAG structure corresponds to an interconnect path 
which serves memory access requests. The edges are directed. The DAG structure shown in 

30 Figure 2 is restricted to have a tree structure wherein each hub H12, Hn and H2 has one master 
port only. This simplifies the forwarding process in the hubs, as they do not need to select a 
master port for each memory request, for example by discriminating ranges on the requested 
address. 
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At the slave ports s of the hubs a memoty access request is received which is 
forwarded by the master port m. The slave ports s of a hub are uniform in that they offer the 
same s^vices. Therefore, it is transparent to the hub whether a slave port s is connected to a 
data processing unit IP or another hub. 
S It is noted that many modifications of the structure over the embodiment of 

Figure 2 are possible and that Figure 2 gives only one example. So, the network of hubs may 
include more than two levels. Fur(faer, the hubs Hii may be connected to a next hub higher in 
the structure. Moreover, a hub may have multiple mast^ ports wherein a single master port is 
selected for forwarding, for example in accordance wilh an address range discriinination. 

10 As fiirther shown in Figure 2, the data processing units, the device transaction 

level DTL, the hubs Hn.Hu and H2 and the memory interface MMI reside on a single chip 
C, whereas the memory SDRAM is provided outside the chip C. 

Due to the hiib structure shown in Figure 2, the number of clients of the 
memory inter&ce MMI is reduced. Further, the Imgth of individual wires and. Urns, Ibe total 

1 5 Iragth of wires are reduced whereby wire congestion is avoided. 

The hierarchical hub structure is completely transparent to the data processing 
units EP, and no modifications of the data processing units IP are required. Also it does not 
affect the way synduonization is performed; tittis may still be handled in a centralized way by 
means of a memoty mapped input/oulput (not shown) and intenupts via a data processing 

20 unit including a CPU. 

Figure 3 shows a second preferred embodiment of the present invention which 
differs from the first embodiment of Figure 2 in tiiat an embedded local monoiy MEM is 
attached to rach of the hubs. However, with respect thereto it should be noted that in an 
attemative embodiment local memories are provided for only some of tiie hubs. The local 

25 mmiories MEM are assigned private segments in the address space; wherein a single address 
space is provided which is distributed over the off-chip memory SDRAM and the multiple 
local memories MEM. The data processing units IP perform memory access requests in the 
usual address-based way, wherein an address can refer to the off-chip memory SDRAM or to 
an on-chip local memory MEM. All data processing units IP are able to access the off-chip 

30 memory SDRAM and a further on-chQ) local memory MEM attached to the memory 
inter&ce MMI, but not all data processing units IP are able to access all ort-chip local 
memories MEM attached to the hubs Hn, H12 and Hz Namely, the data processing units IP 
can access only the local memories MEM attached to the hubs on the route to the memory 
interface MMI. 
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For the data processing units IP sharing a cotnmon hub, a buffered 
communication can be performed via the local memory MEM attached to that hub. A &st- 
in/first-out (FIFO) buffer used for a communication between the data processing units is 
mapped to a memory segment of the local memory MEM attached to a common hub, 
5 preferably to the common hub Hi i or H12 of the &st group being the closest common hub. 
The data processing units IP are simply instructed to acc^ the data in the address range in 
which such buffer has been allocated via their DTL interface, but are unaware of the physical 
location; this is transparent to the data processing units IP. The hubs check the addresses of 
the memory access requests performed by the data processing xmits IP and dther perform an 

10 access to their local memory MEM, if the address is in the specified range, or forward the 
request up in the hierarchy. The buffers for the IP-to-IP communication can be allocated to 
the local memories MEM upon configuration, wherein the address range of the buffer is 
programmed in the hub to which the local memory MEM is attached. 

The IF-to-IP communication which is mapped to a communication via an on- 

1 5 chip local memory MEM does not consunoie expensive bandwidth to the off-chip memory 
SDRAM. The hub structure should be chosen such that the data processing units IP which 
need to ccmununicate frequentiy have a common hub for a communication via the local 
memory attached to such common hub. For example, video data processing units IP should 
be in the same sub-tree of the hub structure. Moreover, the total available memory bandwidth 

20 is incieased significantiy. In parallel to accesses to the ofi^chip memory SDRAM by some 
data processing units IP, accesses to the on-chip local mraiories may be performed by data 
processing units IP located in other (disjunct) sub-trees. Further, the communication via the 
on-chip local memories MEM is more power efficient and can more easily support higher 
bandwidth (wider interconnects and memory ports, higher clocks). 

25 As eTcplained above^ the data processing units IP can simply access data via 

their DTL interface, irrespective of vribiether or not that data is located o]>-chip (local memory 
MEM) or off-chip (memory SDRAM). Hence, on-chip communication is transparent to the 
data processing units IP. This facilitates re-use of the data processing units IP in the proposed 
configuration. 

30 Special attention should be paid to the synchronization among the data 

processing units IP. As already mentioned above, in the DVP system synchronization is 
performed by a data processing unit including a CPU, requiring low rate synchronization at a 
coarse data grain. This results in larger buffo: sizes which can easily be acconmiodated in the 
off-chip SDRAM. For on-chip communication, however, smaller buffers should be used. 
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which lequires synchronization at a finer data grain. For the flinctions performed by many 
data processing units, synchronization at a finer grain (e.g. line or macro block) is merely 
logical. However, using an internet-based schentie for synchronization at a finer grain would 
leadto ahigher interrupt rate on the CPU. 
5 One solution may be to dedicate more CPU power to synchronization or even 

to dedicate a special (Ught-weigjht) CPU to the synchronization task. 

Anoth^ attractive solution is to add synchronization siqpport to Ihe hubs. In 
this case, each hiib can perform the synchronization tasks which are related to the (FIFO- 
based) IP-to-IP communication via the local memory MEM attadied to such hub; i.e., per 

10 first-in/first-out qperation the availability of data and room is administered and signalled to 
the data processing units IP. As an effect, this siqyports continuous (autonomous) operation of 
the data processing units witiiout CPU software intervention, at least as &r as communication 
via local memories is involved. 

It is proposed that the data processing units IP need to make abstract 

15 synchronization calls at their ports. The infirastmcture decides how these ^^chronization 

calls are resolved, dependmt on how the data processing unit IP is integrated in a system-on- 
chip. This would not be hardwired in a re-usable data processing unit For example, if such a 
port maps to local communication via a local monoiy, the corr^onding hub resolves the 
synchronization call locally or forwards it to the next higher hub. If the communication is 

20 done via the off-chip memory SDRAM^ an interrupt may be generated. For die data 

processing unit IP this is hidden by an ''abstract" inter&ce (not shown) vMch is configured 
such that it o&ers a function to the data processing units IP to issue high level requests to the 
communication interface, but hides finm the data processmg units how such requests are 
implemented. When a buffer is provided in the local mraiory MEM attached to a hub, tiie 

25 port addresses for the synchronization are programmed in the hub together with the address 
range of the buffer at the configuration time, accordingly. 

It is noted that in the scheme presented above the local memories MEM are 
used for buffered IP-to-IP communication in which no data needs to travel to the off-chip 
SDRAM at aU. This is different from the use of the on-chip local memories as a cache for 

30 copying data firom the off-chip SDRAM into an on-chip local memory MEM for repeated 
use. 

However, the architecture of Figure 3 could be used to support caching as 
well, wherein two kinds of caching are distinguished: Transparent caching and IP controlled 
caching. 
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With transparent caching, the data processing units IP are not really aware of 
that data are copied to a local memory MEM, other than that it perceives a different latency 
in accessing the data. By means of a cache control jRmction, data are copied from/to the off^ 
chip memory SDRAM to/from an on-chip local memory MEM. Such cache control is 
5 implemented in a hub. Cache coherency must be resolved either by implementing a hardware 
cadbie coherency scheme or by adopting restrictions in the programming model. 

With IP-controlled caching, a data processing unit IP itself copies data from 
the off-chip memory SDRAM to an on-chip local memory for repeated (fine-grain) use. The 
cqpies in Ifae on-chip local memoiy MEM are in a different address range than the 

10 corresponding data in Ihe off-chip memoiy SDRAM. With IP-controlled caching, the data 
processing units IP are responsible for the coherency. As an example of IP-controlled 
caching, it may be considered a three-dimensional graphics rcnderer which *cadies' texture 
data in an on-chip local memory MEM to perform fine grain accesses on it This is done by 
copying the texture data to llie address range of tibie local memoiy MEM attached to a hub 

IS and referring to addresses in that address range when performing the fine grain accesses. In 
such case, the cache control is performed by the data inrocessing units IP themselves, and this 
kind of use is different fix)m the transparent use of the on-chip local memory for IP-to-IP 
communication. Nevertheless, the architecture of Figure 3 supports this kind of use. 

Figure 4 shows a third preferred embodiment which differs fiom tiie first 

20 embodiment of figure 2 in that in addition to a first of&chip memory SDRAM 1 and an 

associated first msmoty inter&ce MMI 1 coiqiled tiiereto a second off-chip memory SDRAM 
2 and an associated second memory inter&ce MMI 2 coiQiled thereto are provided. Further, 
there are provided a plurality of hubs Hn» H12, H21 and H22, wherein the hubs Hn and H12 
each comprises several slave ports s and two master ports m and the hubs Hzi and H22 each 

25 comprises two slave ports s and one master port in. So, in the system of figure 4, the hubs 
define a network of hubs cornprising a first group of hubs Hn and and a second group 
consisting of hubs H2] and H22. The first group of hubs Hn and H12 define a first level 
adjacent to the data processing units IP so that the hiibs Hn and H12 of the first group are 
directiy connected to the data processing units IP via its slave ports s. The second group of 

30 hubs H2] and H22 define a second level adjacmt to the memory inter&ces MMI 1 and MMI 2, 
wherein the hubs H21 and H22 each are connected via one of its slave ports to one of the 
master ports m of the hub Hn and fiirther via the other one of its slave ports s to one of the 
master ports m of the hub H12. Moreover, the hub H21 is connected via its master port m to 
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the first memory inter&ce MN0 1, and the hub H22 is comiected via its master port m to the 
second memory int^ace MMI 2. 

So, in the Hiird enobodiment of figure 4, the network of hubs is organized as a 
DAG stracture like in the first and second embodiments of figures 2 and 3, respectively^ but 
5 Ihe data processing units IP have optional access to two ofi^chip memories SDRAM 1 and 
SDRAM 2, whereas in die first and second embodiments of figures 2 and 3 there is only 
conmxunication between a single off-chip memory SDRAM and the data processing units IP. 
Apart fix)m that two ofT-chip memories SDRAM 1 and SDRAM 2 are coimected to llie 
network of hubs, the operation of the third embodiment of figure 4 is the same as that of the 
10 first embodiment of figure 2 so that reference is made to ffae description of figure 2. Further, 
in the third embodiment of figure 4 local memories and a synchronisation as described in 
conjunction with figure 3 may optionally be provided as well. 

As described above, the third enibodiment of figure 4 comprises two off-chip 
memories SDRAM 1 and SDRAM 2. However, more than two off-chip memories may be 
15 provided as well. 

In the above description, it has be^ presented a next step in tiie evolution of a 
multi-processing data system like a DVP platform whichallows tiie use of oiirchip local 
memories MEM to avoid a communication bottieneck to the off^hip memory SDRAM. The 
use of the local memories MEM for IP-to-IP communication is largely transparent to the data 
20 processing units IP. 

A further advantage of the above described aichitecturo according to the 
present invention is lliat it can be introduced in a stepwise manner. A first chip to adopt the 
solution can use an on-chip local memory MEM and a new hub (e.g. Hii) for just a few data 
processing units IP which want to communicate via such local memory MEM. In later chips, 
25 the number of hubs with local memories may gradually rise, and the on-chip communication . 
&cilities may be used for more and more IP-to-IP communication. 

Although the invention is described above with reference to examples shown 
in the attached drawings, it is apparent that the invention is not restricted to it but can vary in 
many ways vrithin the scope disclosed in the attached claims. 
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CLAIMS: 



1. A data processing system, comprising a memory means (SDRAM) and a 
plurality of data processing means (IP) provided for accessing to said memory means 
(SDRAM), characterized by a communication interface means coupled between said memory 
means (SDRAM) and said plurality of data processing means (IP), said communication 

5 interfece means including a network of nodes (Hn, H12, H2), each node comprising at least 
one slave port (s) for receiving a memory access request from a data processing means (IP) or 
jfirom a previous node and at least one master port (m) for issueing a memory access request 
to a next node or to said memory means (SDRAM) in accordance witihi the memory access 
request received at said slave port (s), wherein said at least one slave port (s) is connected to 
10 a master port (m) of a previous node or to one of said data processing means (IP) and said at 
least one master port (m) is connected to a slave port (s) of a next node or to said memory 
means (SDRAM). 

2. The data processing system according to claim 1 , characterized in that at each 
1 5 node means the number of said slave ports (s) is higher than the number of said master ports 

(m). 

3. The data processing system according to claim 1 or 2, characterized m that 
said network of node means (Hn, H12, H2) is hierarchically structured. 

20 

4. The data processing system according to claim 3, characterized in that said 
plurality of node means (Riu H12, H2) are arranged in a directed acyclic graph structure. 

5. The data processing system according to claim 4, characterized in that said 
25 plurality of node means (Rtu Hi2> H2) are arranged in a tree structure. 

6. The data processing system according to at least any one of the preceding 
claims, characterized in that said plurality of node means (Hn, H12, H2) include n groups of 
node means with n > 2, wherein each of the slave ports (s) of the node means (Hn) of a first 
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group is connected to one of said plwality of data processing means (EP), the master ports 
(m) of the node means (H2) of the n* group are coupled to said memory means (SDRAM), 
and each of the slave ports (s) of the node means (H2) of the n* groiq) is connected to a 
master port (m) of the node means (Hn) of the (n-l)**^ group. 

7. The data processing system according to at least any one of the preceding 

claims, characterized in that said node means (Hn, Hn, H2) are hubs. 

8- The data processing system according to at least any one of the preceding 

claims, characterized in that said communication inter&ce means further includes at least one 
local memory unit (MEM) adapted to be selectively accessed to by a memory access request. 

9. The data processing system according to claim 8, characterized in that at least 

one node means (Hji, H12, H2) fiirther comprise at least one memory port (mp) to which a 
local memory unit (MEM) is connected. 

10- The data processing system according to claim 8 or 9, characterized in tiiat 
said communication inter&ce means includes a cache controller means for controlling at least 
a section of the local memory unit(s) (MEM) as a cache memory. 

11- The data processing system according to at least any one of the preceding 
claims, characterized in that said communication inter&ce means further includes at least one 
synchronization means for streanoiing communication between data processing means (IP). 

12. The data processing system according to claim 1 1, characterized in that at least 
one node means (Hn, H12, H2) includes said qoichronization means for streaming 
communication between the data processing means (LP) directly or indirectly coupled to said 
node means. 

13. The data processing system according to claim 8 as well as to claim 11 or 12, 
characterized in that title local memory unit(s) (MEM) is (are) configured to provide the 
storage means for a first-in/first-out function and said synchronization means comprises a 
first-in/first-out administration means for controlling said local memory unit(s) (MEM). 
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14. The data processing system accotdmg to at least any one of the preceding 
claims, chaiacterized in that said communication inter&ce means is provided on a single chip 
(Q. 

15. The data processing Systran according to claim 14, characterized in that at least 
a part of said plnralily of data processing means (IP) is additionally provided on said single 
chip(C). 
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ABSTRACT: 



The disclosed data processing system comprises a memory means (SDRAM), 
a plurality of data processing means (IP) provided for accessing to said memory means 
(SDRAM), and a communication intear&ce means coupled between said memory means 
(SDRAM) and said plurality of data processing means (IP), said communication interface 
5 means including a network of nodes (HI 1, H12, H2), each node comprising at least one slave 
port (s) for receiving a memory access request from a data processing means (IP) or from a 
previous node and at least one master port (m) for issuing a memory access request to a next 
node or to said memory means (SDRAM) in accordance with the memory access request 
received at said slave port (s), vt^herein said at least one slave port (s) is connected to a master 
10 port (m) of a previous node or to one of said data processing means (IP) and said at least one 
master port (m) is connected to a slave port (s) of a next node or to said memory means 
(SDRAM). 
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